Enhanced expression of rna vectors

ABSTRACT

The present invention relates to methods and compositions for enhancing expression from RNA expression vectores. The invention is based upon the observation that reducing the frequency of the dinucleotide CpG and UpA has a significant effect on expression from such vectores. Aspects of the invention include, amongst others, synthetic RNA vectores, virions, cells, methods of producing vaccines and methods of treatment or immunisation.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of U.S. application Ser. No.14/779,069, filed on Sep. 22, 2015, now U.S. Patent Application No. US2016/0053281, pursuant to 35 U.S.C. 371 of International Application No.PCT/GB2014/050917, filed on Mar. 24, 2014, published in English as WO2014/155076 and entitled “Enhanced Expression of RNA Vectors”. Thisapplication further claims the benefit of foreign priority to GB1305361.6, entitled “Enhanced Expression” and which was filed on Mar.25, 2013, through PCT/GB2014/050917 filed on Mar. 24, 2014. The entirecontents of the aforementioned patent applications are incorporatedherein by this reference.

BACKGROUND OF THE INVENTION

The present invention relates to methods for enhancing the expression ofproteins by modifying the nucleotide composition of the encoding nucleicacid. In particular it relates to methods for enhancing the expressionof RNA expression vectors, by reducing the frequency of CpG and/or UpAdinucleotides. The invention also relates to nucleic acids modified insuch a way, and to systems in which such nucleic acids are used.

The base composition of DNA of mammals and other eukaryotes showevidence for complex selection pressures and mutational mechanisms. Invertebrates, regions of extensive under-representation of CpGdinucleotides (i.e. C followed by G) are found throughout the genome(Russell et al. 1976, J. Mol. Biol. 108: 123). This is thought tooriginate largely as a result of DNA methylation that has a mutageniceffect on the cytosine residue.

For example, in the human genome, which has a mean 42% G+C content, apair of nucleotides consisting of cytosine followed by guanine would beexpected to occur 0.21*0.21-0.041 the time. The actual frequency of CpGdinucleotides in human genomes is 1-less than one-quarter of theexpected frequency. It is proposed that the CpG deficiency is due to anincreased vulnerability of methylcytosines to spontaneously deaminate tothymine in genomes with CpG cytosine methylation.

In a recent large scale bioinformatic analysis of various eukaryoticgroups that show differing degrees of genomic DNA methylation, evidencewas found for further mutational mechanisms operating on genomic DNA andevidence for strong selection against UpA and CpG dinucleotides amongthe subset of genomic DNA sequences that are transcribed as RNA andtransported to the cytoplasm (Simmonds et al. 2013, BMC genomics, 14,610).

A similar selection process was identified in RNA viruses infectingmammals and plants that potentially accounts for their previouslydescribed, but unexplained, under-representations of these dinucleotides(Rima and McFerran. 1997, J. Gen. Virol. 78:2859-2870). The nature ofthe selection against CpG and UpA dinucleotides is poorly understood andhas not been investigated functionally to date.

Further evidence that the presence of CpG dinucleotides in viralsequences either activate or are targets of cell defence mechanisms isprovided by the observation that polioviruses with artificially elevatedCpG frequencies in their genomic RNA were markedly attenuated andreplicated to titers several orders of magnitude lower than wild typevirus in in vitro cell culture (1-3). This effect was independent ofchanges in translation efficiency through alteration of codon usage andcodon pair bias.

The attenuation of poliovirus with artificially elevated CpG frequenciesin their genomic RNA was additionally unrelated to differences inToll-like receptor 9 (TLR9) signalling as the poliovirus genome iscomprised of RNA which is not a substrate for TLR9. This contrasts withDNA based expression systems in which the CpG content is reduced oreliminated to enhance expression through avoidance of TLR9-inducedactivation of transfected cells. For example, the pCpGfree DNA plasmidvectors from Invivogen (San Diego, Calif.) are CpG free and Invivogenalso provide a service in which they will create a CpG-free DNA versionof a gene of interest and insert it into a pCpGfree DNA plasmid. Therationale behind this technology is that bacterial DNA is rich inunmethylated CpG dinucleotides, in contrast to mammalian DNA whichcontains a low frequency of CpG dinucleotides that are mostly methylated(Bauer et al. 2001 , PNAS USA, 98(16):9237-42). Unmethylated CpGs inspecific sequence contexts activate the vertebrate immune system viaToll-like receptor (TLR) 9. TLR9 recognizes CpG in DNA and initiates asignalling cascade leading to the production of pro-inflammatorycytokines such as IL-6 and IL-12. Plasmids used for in vivo experimentsare produced in E. coli and therefore their CpGs are unmethylated andinduce immune responses through this host defence mechanism, whichrepresents a limitation for the clinical development of DNA vaccines andgene therapy vectors. Thus, this technology is limited to DNA-basedexpression systems. Furthermore, given that the TLR9 system acts only onDNA, there is no basis to believe that the rationale could extend toRNA-based expression systems

There remains a need for improved systems for the expression of proteinsencoded on RNA polynucleotides. In particular there is a need to improveexpression of RNA expression vectors, such as RNA viral vectors, insuitable expression systems.

SUMMARY OF THE INVENTION

According to the present invention there is provided a synthetic RNAexpression vector comprising a sequence encoding an expression product,the nucleic acid comprising at least one region in which the nucleotidecomposition has been modified such that the frequency of CpG and/or UpAdinucleotides is reduced relative to normal frequency.

The term ‘synthetic RNA expression vector’ refers to a nucleic acidconstruct formed of RNA, the construct comprising a sequence encoding anexpression product and at least one regulatory sequence (e.g. apromoter) to drive expression of the expression product. The syntheticRNA expression vector can be capable of replication in a host cell or itcan be replication deficient. Thus the synthetic RNA expression vectorcan comprise virion control elements and coding regions to allowreplication in a host cell.

Suitably the synthetic RNA expression vector is a recombinant RNA viralvector, e.g. a recombinant virus genome.

Preferably the frequency of both CpG and UpA dinucleotides is reducedrelative to normal frequency.

Suitably the at least one region is at least 30 nucleotides in length,more preferably at least 100 nucleotides in length, yet more preferablyat least 200 nucleotides in length and suitably at least 500 nucleotidesin length. In some embodiments the at least one region can be over 1000nucleotides in length. A given synthetic RNA expression vector accordingto the present invention can comprise one or more than one (e.g. 2, 3,4, 5, 6, 7, 8, 9, or 10) regions in which the CpG and/or UpA frequencyregion is reduced. One could view each base change to a wild typesequence as an individual ‘region’, and in some cases the presentinvention envisages such minor changes, e.g. in highly constrainedsynthetic RNA expression vectors. However, typically several changes aremade within a longer region in order to bring about a more significantchange in the frequency of CpG and/or UpA dinucleotides in a givensynthetic RNA expression vector. For example, typically much or all ofone or more ORFs will be modified to reduce the frequency of CpG and/orUpA dinucleotides. Many synthetic RNA expression vectors will comprisetwo or more ORFs, and in that case it is envisaged that regionscorrelating to some or all of those ORFs will be modified.

The present inventors have made the unexpected discovery that reducingthe occurrence of CpG or UpA levels below those found in wild typesequences enhances their expression in RNA viral vectors. It istypically thought that sequences of such RNA viruses in their naturalcontexts are optimised for expression, and hence replication, in theirrelevant environment. Sequences in nature have evolved such that theoccurrence of CpG or UpA dinucleotides is reduced relative to thestatistically expected number for reasons that are not entirely clear,as discussed elsewhere in this application and the various citeddocuments referred to herein. It is known that increasing the occurrenceof CpG or UpA dinucleotides in RNA can lead to a reduction in expressionlevels. However, it has not been proposed or discussed that reducing theoccurrence of CpG or UpA in an RNA expression vector below those foundin the wild type sequence leads to an increase in expression levels.

This finding is surprising because there is an assumption that wild typeviral sequences are highly optimised for expression in their respectivehost. Thus one would not expect that creating a sequence which deviatessignificantly from the natural sequence, e.g. by reducing the occurrenceof CpG and UpA dinucleotides to an artificially low level, would resultin an increase in expression. It is well known that viruses arerelatively free to evolve rapidly and that they are under strongevolutionary pressure in order to maximise their relationship with theirhost and avoid host defences. This results in viral expression systemsbecoming rapidly highly optimised for their host. In particular, theopen reading frames (ORFs) of viral genomes encoding viral componentsare expected to have evolved to an optimum composition to maximise theirrelationship with their host. Indeed, it can be observed that thefrequency of CpG and UpA dinucleotides in the ORFs of viral genomestypically closely mirrors those of their host. If a further reduction ofCpG or UpA dinucleotide occurrence would further benefit expression ofORFs of viral genomes, then one would expect this to have occurred.

This is all the more relevant in the case of RNA viruses, in which themutation rate is far higher than DNA viruses, because RNA polymeraselacks the proof-reading mechanism of DNA polymerases.

As touched on above, the current understanding of the mechanism thoughwhich CpG dinucleotides affect expression is that TLR9 receptorsrecognise unmethylated CpG dinucleotides in DNA molecules and induce theinnate immune system. However, TLR9 does recognise ribonucleic acid CpGdinucleotides, and therefore this system would not have any effects onRNA-based expression vectors.

When one refers to the ‘frequency’ of a given dinucleotide, one isreferring to the number of times it occurs in the relevant sequence. Ina random sequence of sufficient length and equal frequencies of all 4bases, one would expect any given dinucleotide to occur 1/16th of thetime as a result of chance, there being 16 possible dinucleotides. Asdiscussed above, in real world situations, the normal frequency of anygiven dinucleotide in a given sequence is not random, because there arevarious pressures (some known, others not) acting upon the sequencecomposition. Thus, the actual frequency of a given dinucleotide variesfrom the expected frequency; in the case of CpG and UpA in mammalian orplant genomes, it is typically reduced. The present invention isconcerned with reducing the frequency of CpG and/or UpA dinucleotidesbelow their normal frequency (i.e. the frequency with which they occurin their normal context) to improve expression.

‘Normal frequency’ in the context of the present invention refers to thefrequency of occurrence of CpG or UpA in an unmodified sequence,typically a wild type sequence. For example, in the case of a gene, thenumber of CpG or UpA dinucleotides in the synthetic nucleic acidaccording to the present invention would be fewer than the number of CpGor UpA dinucleotides in the wild type gene as it occurs in nature.

Preferably the frequency of CpG dinucleotides in the at least one regionin which the nucleotide composition has been modified is reduced by atleast 50%, i.e. if the normal sequence of interest contained 100 CpGdinucleotides, then it is preferred that the modified sequence contains50 CpG dinucleotides or fewer.

Preferably the frequency of UpA dinucleotides in the at least one regionin which the nucleotide composition has been modified is reduced by atleast 50%, i.e. if the normal sequence of interest contained 100 UpAdinucleotides, then it is preferred that the modified sequence contains50 UpA dinucleotides or fewer.

Preferably the frequency of both CpG and UpA dinucleotides in the atleast one region in which the nucleotide composition has been modifiedis reduced by at least 50%.

More preferably the frequency of CpG and/or UpA dinucleotides in the atleast one region reduced by at least 60%, more preferably 70%, 75%, 80%,85%, 90%, 95% or 100%.

In a particularly preferred embodiment of the present invention thefrequency of CpG and/or UpA dinucleotides in the at least one region inwhich the nucleotide composition has been modified has been modifiedsuch that it contains no CpG and/or UpA dinucleotides.

Considering the synthetic RNA expression vector as a whole, it ispreferred that the frequency of CpG and/or UpA dinucleotides is reducedby at least 20%, more preferably at least 30%, 50%, 60% 70%, 80% or even90% or higher.

Preferably the reduction of the frequency of CpG and/or UpAdinucleotides is achieved through the introduction of substitutions inthe relevant region that do not influence its protei coding (synonymoussubstitutions).

Given the degeneracy of the genetic code, it is typically possible toreduce CpG content to zero in coding sequences without altering theencoded amino acid sequence, i.e. by synonymous substitution. For UpAthe restriction that UpAp(U/C) codons encode tyrosine often precludeselimination of all UpA dinucleotides without alteration of the encodedamino acid sequence; in some cases it may be possible to work aroundthis by altering the sequence to introducing a similar aminoacid—depending on the context tyrosine can be substituted by otheraromatic amino acids, in particular phenylalanine is in many wayschemically similar, although it lacks the hydroxyl group of tyrosine.

Suitably the frequency ratio of the relevant dinucleotide (i.e. CpG orUpA dinucleotides) is 0.4 or lower, preferably 0.3 or lower, morepreferably 0.2 or lower, and most preferably 0.1 or lower in thesynthetic RNA expression vector as a whole. For the avoidance of doubt,the ‘frequency ratio’ is defined as the ratio of observed dinucleotidefrequency to the expected frequency based on mononucleotide composition(i.e. f(CpG)/f(C)*f(G)). The wild type frequency ratio for each of GpGand UpA is typically around 0.4 in vertebrates and 0.5 among RNA viruseswhich infect them.

In a preferred embodiment, the region of nucleic acid with reducedfrequency of CpG or UpA dinucleotides is in a sequence which encodes anexpression product. It is thus preferred that the nucleic acid withreduced frequency of CpG and/or UpA dinucleotides is an open readingframe (ORF).

Thus, it is typically preferred that the region or regions of thesynthetic RNA expression vector in which the frequency of CpG and/or UpAdinucleotides have been reduced are coding regions of the vector of thepresent invention.

However, it is within the scope of the present invention that frequencyof CpG and/or UpA dinucleotides is reduced in regions other than codingregions. Thus, the frequency of CpG and/or UpA dinucleotides can bereduced in non-coding regions. It is typically important that, whereregions outside of ORFs are altered to remove CpG and/or UpAdinucleotides, the alterations do not adversely affect the vector. Forexample, alterations in sequences responsible for replication, such astranslation, transcription or replication elements, could lead to a lossof replication competency. Alternatively, alterations in expressioncontrol sequences could adversely affect expression of an expressionproduct.

Another situation where it may be problematic to remove CpG and/or UpAdinucleotides is in sequences with overlapping ORFs. An overlapping ORFis where a given sequence codes for more than one expression product(e.g. a protein), but where each expression product is in a differentreading frame (i.e. offset by one or two positions). This situation isuncommon other than in viruses where there is pressure to maximisecoding capacity of the genome. In the case of overlapping ORFs care mustbe taken that alterations to reduce the CpG and/or UpA content do notinadvertently disrupt the second reading frame. Of course, if only theexpression product of the first reading frame is of interest then itwould not matter if the second reading frame was abrogated.

It is typically most preferred that substantially all of the codingregions of the synthetic RNA expression vector have been modified tohave a reduced CpG and/or UpA dinucleotide frequency. However, wherethere are overlapping ORFs or other features which constrain thepossibility of making silent changes to sequence in some regions, it ispreferred that in all other coding regions are modified.

The term ‘non-constrained coding sequences’ can be used to refer to allsequences which are not constrained in terms of modifying their sequencethrough synonymous substitutions, e.g. because of overlapping ORFs.Thus, in preferred embodiments, substantially all non-constrained codingsequences of the vector have been modified to reduce the frequency ofCpG and/or UpA dinucleotides.

Preferably, regions totalling at least 50% of the total length of thesynthetic RNA expression vector have been modified to reduce the CpGand/or UpA dinucleotides frequency. More preferably regions totalling atleast 60% of the total length of the synthetic RNA expression vectorhave been modified, yet more preferably at least 70%.

It has been observed that enhancement of expression is generally dosedependent, with increased reduction of CpG and/or UpA dinucleotidesfrequency resulting in a corresponding increase in expression. Thus, itis typically preferred that reduction of CpG and/or UpA dinucleotidesfrequency is maximised. This can be achieved in two ways, 1) maximisingthe proportion of the total sequence length in which the CpG and/or UpAdinucleotides frequency is reduced, and 2) maximising the extent towhich CpG and/or UpA dinucleotides frequency is reduced in thoseregions. Preferably both 1) and 2) of these are maximised in order tooptimise expression.

Where the synthetic RNA expression vector comprises a sequence encodinga reporter expression produce (e.g. luciferase), it is preferred thatthis sequence also has reduced frequency of CpG and/or UpAdinucleotides.

In a particularly preferred embodiment of the present invention, theregion having reduced frequency of CpG and/or UpA dinucleotidescomprises a sequence of viral origin. More preferably it is a viral ORF.In a particularly preferred embodiment it is derived from a viralgenome.

In a particularly preferred embodiment the synthetic RNA expressionvector is a recombinant genome of an RNA virus. An RNA virus can bedefined as any virus with a genome formed of RNA and which does notinclude a DNA intermediate as part of its life cycle. Examples of RNAviruses include influenza viruses, hepatitis C virus, SARS coronavirus,poliovirus, measles virus and West Nile virus. RNA viruses can also bedefined as those that belong to groups III, IV or V of the Baltimoreclassification system of classifying viruses.

Preferably the virus is a virus which infects humans. Alternatively thevirus is a virus which infects non-human animals, for example such aspigs, cattle, horses, dogs, cats, birds or sheep.

In preferred embodiments of the present invention the synthetic RNAexpression vector comprises a recombinant single stranded RNA (ssRNA)virus genome. Suitably the synthetic RNA expression vector comprises arecombinant negative sense ssRNA virus genome, e.g. any virus from GroupV. Alternatively, the synthetic RNA expression vector comprises arecombinant positive sense ssRNA virus genome, e.g. any virus from GroupIV.

In alternative embodiments, the synthetic RNA expression vectorcomprises a recombinant double stranded (dsRNA) virus genome, e.g. anyvirus from Group III.

In a particularly preferred embodiment the synthetic RNA expressionvector comprises a RNA virus adapted for expression in a suitableexpression system for the production of a virus vaccine. Production ofsuch RNA virus vaccines typically involved production of a replicationcompetent virus, followed by its inactivation prior to use as a vaccine.Commonly used inactivated human RNA virus vaccines have been developedfor poliovirus, influenza A and B viruses, hepatitis A virus, hepatitisE virus, rabies virus and tick-borne encephalitis virus. Thus thesevirus vaccines are particularly suited for use in the present invention.

However, the invention can of course be applied to any modified RNAvirus used for vaccination, for example for veterinary use.

In one highly preferred embodiment of the present invention thesynthetic RNA expression vector comprises a recombinant influenza Avirus genome with reduced and CpG and UpA dinucleotide frequencies.

In another embodiment of the present invention, the synthetic RNAexpression vector comprises a recombinant echovirus genome in whichcoding regions have been modified to reduce CpG and/or UpA (preferablyboth) dinucleotide frequencies. In one particular example, there isprovided an echovirus 7 genome in which wild type region 1 and/or region2 (as defined below) have been modified to reduce CpG and/or UpA(preferably both) dinucleotide frequencies. For example, regions 1 andor 2 can be modified by replacing the wild type sequence with SEQ ID NOS3 to 8 described below, as appropriate. In particular, SEQ ID NOS 7 and8 can be inserted to replace the wild type sequences.

In a further aspect of the present invention there is provided a virioncomprising a synthetic RNA expression vector as defined above.Preferably the virion is capable if infecting a suitable host cell. Thevirion comprises the synthetic RNA expression vector and viral proteinscoat proteins. The virion may further comprise an envelope.

Suitably the virion is an RNA viral vaccine. Viral vaccines expressingheterologous pathogen antigens can be used as vaccines against thesepathogens, based on the same rationale as DNA vaccines. Viral vaccinescomprise a modified viral genome adapted to produce one or more antigensfrom a given pathogen. The viral vaccine is delivered to the cells ofthe body, where the antigen is expressed. Because the antigens arerecognised as foreign, when they are processed by the host cells anddisplayed on their surface, this stimulates a range of immune responses.In such viral vaccines it is desirable to maximise expression of theantigen, and thus the present invention is highly relevant.

In many embodiments of the invention, the synthetic RNA expressionvector or virion of the present invention is in isolated form. The term“isolated” means a biological component (such as a nucleic acid moleculeor protein) that has been substantially separated or purified away fromother biological components in the cell of the organism in which thecomponent naturally occurs, i.e., other chromosomal and extrachromosomalDNA and RNA, and proteins. Nucleic acids and proteins that have been“isolated” include nucleic acids and proteins purified by standardpurification methods. The term also embraces nucleic acids and proteinsprepared by recombinant expression in a host cell as well as chemicallysynthesized nucleic acids, proteins and peptides.

The term ‘purified’ does not require absolute purity; rather, it isintended as a relative term. Thus, for example, a purified form is onein which the vector or virion is more enriched than the vector or virionis in its environment within a cell, such that the peptide issubstantially separated from cellular components (e.g. lipids,carbohydrates, other nucleic acids and other polypeptides) that mayaccompany it. In another example, a purified preparation is one in whichthe vector or virion is substantially free from contaminants. In oneexample, a vector or virion of the disclosure is purified when at least50% by weight of a sample is composed of the vector or virion, forexample when at least 60%, 70%, 80%, 85%, 90%, 92%, 95%, 98%, or 99% ormore of a sample is composed of the vector or virion.

A ‘recombinant’ or ‘synthetic’ nucleic acid is one that has a sequencethat is not naturally occurring and/or has a sequence that is made by anartificial combination of two otherwise separated segments of sequence.This artificial combination is often accomplished by chemical synthesisor, more commonly, by the artificial manipulation of isolated segmentsof nucleic acids, e.g., by genetic engineering techniques. Suitabletechniques are set out in Green & Sambrook, Molecular Cloning: ALaboratory Manual (Fourth Edition), 2012, Cold Spring Harbor LaboratoryPress.

In a further aspect the present invention provides a host cellcomprising a synthetic RNA expression vector as defined above.

Preferably the cell is a eukaryotic cell, more preferably an animalcell, more preferably a mammalian cell. Suitable host cells will dependon the nature of the vector, especially the regulatory sequences iscontains.

In a further aspect the present invention provides a clonal cellpopulation derived from the host cell described above.

In another aspect the present invention provides an egg comprising cellswhich comprise a synthetic RNA expression vector as defined above.

Suitably the egg is an avian egg, more preferably a chicken egg. Forexample, the present invention provides an avian egg inoculated with asynthetic RNA expression vector as defined above, which encodes an RNAvirus, e.g. an influenza virus.

In another aspect the present invention provides an expression systemcomprising cells comprising a synthetic RNA expression vector as definedabove, and a suitable growth medium.

In a further aspect, the present invention provides an RNA virus vaccinecomposition, the composition comprising a virion which comprises asynthetic RNA expression vector as defined above, the vector comprisinga sequence encoding one or more antigens. The antigen is preferablyheterologous, i.e. it is derived from (and intended to raise immunityagainst) a pathogen which is different from the virus from which thesynthetic RNA expression vector or the virion proteins are derived.Suitably the RNA virus vaccine composition comprises a pharmaceuticallyacceptable carrier, excipient or adjuvant. The RNA virus vaccinecomposition is preferably formulated for delivery to a subject in needof vaccination, e.g. an animal. Delivery can suitably be via oral orparenteral routes. Actual methods for preparing administrablecompositions, whether for intravenous or subcutaneous administration orotherwise, will be known or apparent to those skilled in the art and aredescribed in more detail in such publications as Remington'sPharmaceutical Science, 19th ed., Mack Publishing Company, Easton, Pa.(1995).

In another aspect the present invention provides a method of producing asynthetic RNA expression vector, the method comprising:

-   -   providing a primary nucleotide sequence of interest (e.g. a        gene, an ORF or a portion thereof);    -   identifying at least one region of said primary nucleotide        sequence susceptible to modification to reduce the frequency of        CpG and/or UpA dinucleotides;    -   identifying one or more sequence modifications in said at least        one region which will reduce the frequency of CpG and/or UpA        dinucleotides;    -   providing a modified nucleotide sequence comprising some or all        of said sequence modifications;    -   producing a synthetic RNA expression vector comprising said        modified sequence which has a reduced frequency of CpG and/or        UpA dinucleotides compared to a corresponding (i.e. otherwise        identical) synthetic RNA expression vector which comprises the        primary nucleotide sequence.

Suitably manipulation of sequences (e.g. assembly of component parts ofan expression vector) can be performed in a DNA ‘intermediate’, forsubsequent transcription into an RNA form. Manipulation of DNA istypically much more straightforward than direct manipulation of RNA, andthus the present invention contemplates the use of DNA polynucleotidesas ‘working’ molecules where required. References to ‘reducing thefrequency of CpG and UpA nucleotides’ should thus be understood asincluding corresponding changes in the DNA intermediates (which ofcourse will not include uracil) which result in a reduction in frequencyin the RNA end product. An exemplary methodology using a DNAintermediate is described in detail below in respect of E7, and similartechniques for other viral vectors would be apparent to the skilledperson. Accordingly, the method may comprise preparing a DNApolynucleotide which encodes a synthetic polynucleotide having a reducedCpG and/or UpA frequency. It may also comprise the step of transcribingsaid DNA polynucleotide to form a synthetic RNA polynucleotide having areduced CpG and/or UpA frequency.

Modification of the sequence will typically involve making synonymoussubstitutions, which do not change the encoded amino acid sequence, butin some cases may involve making an alteration which results in aconservative amino acid substitution, as is discussed in more detailabove. In silico methods for identifying suitable sequence changes arepreferred.

According to another aspect of the present invention there is provided amethod of producing an expression product, the method comprising:

-   -   providing a host cell comprising a synthetic RNA expression        vector as defined above;—incubating said cell under suitable        conditions to induce expression from the vector; and—recovering        the expression product.

Suitably the expression product comprises a modified or wild type viralprotein.

Suitably the expression product is a virion.

Suitably the method includes the following steps:

-   -   providing an RNA sequence encoding an expression product;    -   altering the nucleotide composition of said sequence to reduce        the frequency of CpG and/or UpA dinucleotides; and    -   introducing a synthetic RNA expression vector comprising said        nucleic acid into a host cell.

The method may comprise transducing a cell with the synthetic RNAexpression vector. This is of course particularly relevant where thesynthetic RNA expression vector is comprised in a virion which is ableto infect the cell.

In a preferred embodiment the method is a method for production of aviral vaccine. Such a method suitably comprises providing a virioncomprising a synthetic RNA expression vector as defined above,introducing said synthetic RNA expression vector to a cell (e.g. in eggculture), incubating said cell under suitable conditions to produceviral proteins and thereby allow replication of the virion in the cell,and then inactivating the virion prior to use as a vaccine.

The method may suitably involve at least partially purifying the virionthereby produced.

In certain embodiments, the method can comprise a method of increasingthe rate of replication of a synthetic virus within a host system byreducing the frequency of CpG and UpA dinucleotides in the viruscompared with normal frequency.

In a further aspect the present invention provides a synthetic RNAexpression vector as set out above for use in a method of treating orimmunising against a disease.

In this aspect the synthetic RNA expression vector can be in the form ofa viral vaccine.

In a further aspect the present invention provides a method of treatingor preventing a disease by administering a pharmaceutical compositioncomprising the synthetic RNA expression vector as set out above (e.g. asa viral vaccine).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. RNA to infectivity ratios of WT and viruses with modifiedCpG/UpA frequencies. WT and mutant viruses were recovered from RD cellsand titred by TCID50. The number of viral genome copies was determinedthrough qRT-PCR and compared with the infectivity titre. Results are themean and standard error from three separate extractions.

FIG. 2A-C. Replication kinetics of WT and modified viruses infected at alow MOI. RD cells were infected with E7 WT, permuted, CpG/UpA-highmutant (A) and CpG/UpA-low mutant (B) virus at an MOI of 0.01. Theinoculum was removed and cells washed after 1 hour. The infectious titreof cell supernatants was then analysed at a range of time points by TCID50. Results are the mean of three biological replicates. The mean titreand standard error of all the viruses is shown at 24 hours postinfection (C).

FIG. 3A-B. Plaque morphology of E7 WT and modified viruses. RD cellmonolayers in 10cm plates were infected with a similar infectious titreof virus and incubated for 96 hours at 37° C. (A). Plaque area wasdetermined using ImageJ software (B). Results are the mean of an equalnumber of plaques selected randomly from one plate per virus. Asterisksshow a significant difference from the WT value as determined by t test(* p<0.05, **p<0.01).

FIG. 4. Synchronised infection with equal viral genome copies. Cellswere synchronously infected with 1000 genome copies of WT, R1/R2CpG-high or R1/R2 UpA-high virus, as calculated using qRT-PCR. Cellswere trypsinised and washed 1 or 4 hours post infection and theintracellular viral load determined by qRT-PCR. Results are the mean andstandard error of three biological replicates.

FIG. 5. Analysis of luciferase expression driven by E7 replicons withreduced CpG/UpA frequencies. Replicons were generated with reducedCpG/UpA frequencies, based on the backbone pRiboE7luc replicon, in whichthe structural genes of E7 are replaced by an insect luciferase gene. Inthe pRiboE7(CpG/UpA low)luc replicon the luciferase gene itself wasmodified to minimise both CpG and UpA frequency; in the pRiboE7(CpG/UpAlow)luc R2 CpG-low and pRiboE7(CpG/UpA low)luc R2 UpA-low repliconsRegion 2 was additionally modified to further reduce either CpG or UpAfrequency. RNA was generated from replicons and 50 ng transfected intoRD cells.

Luminescence was measured relative to the mock-transfected control.Results are the mean and standard error of three biological replicates.

FIG. 6. Fitness determination by competition assays between WT andmodified viruses. Cells were infected with an equal MOI of WT andmodified virus, and the supernatant serially passaged through cells. RNAwas isolated and the composition of each virus determined throughselective restriction digest (enzymes used are given in Table 2). Imagesshow the virus composition in the starting inoculum and in threebiological replicates following passage.

FIG. 7A-B. Pairwise fitness comparison between CpG-low and UpA-lowviruses. Cells were infected with an equal MOI of two viruses and thesupernatant serially passaged. The composition of each virus wasdetermined through selective digest, and is displayed by differentialshading (A). The more rapidly the virus on the left out-competed thevirus shown above, the darker the shading. A fitness ranking was thendetermined (B).

FIG. 8. Genome organisation of E7 and positions of mutated insertregions. Insert positions are compared to genome diagram and a plot ofsequence variability within species B at synonymous sites (dotted line)and folding energies indicative of RNA secondary structure (solid line).Variability at synonymous sites (left y-axis) was computed at each codonposition in alignments, plotted with a window size of 41 codons. MFEDvalues (right y-axis) for sense and antisense RNA sequences werecalculated for 200 base fragments, incrementing by 48 bases; valuesplotted represent mean values of 5 consecutive fragments. Nucleotidepositions were calculated relative to the pT7:e7 clone sequence.

FIG. 9. Effect of CpG and UpA frequency changes on the replication ofTMEV in mouse RAW cells. Removal of CpG and UpA dinucleotides innon-structural gene region of the genome led to enhancement ofreplication as determined by quantitative PCR of TMEV RNA sequences at24 hours post-infection (y-axis scale). Conversely, addition of CpG andUpA dinucleotides in this genome region suppressed replication. Thedegree of replication enhancement and attenuation was comparable to thatobserved in mutants of echovirus 7 with similar extents of sequencereplacement (single genome regions; Atkinson et al. 2014, Nucleic acidsresearch, gku075).

FIG. 10. Effect of CpG and UpA reduction in the luciferase gene on geneexpression and replication of the HCV replicon. Removal of CpGs and UpAsenhanced luciferase expression immediately post-transfection andaccelerated replication relative to the unmodified Con-1 replicon for atleast 96 hours. Pol- is the non-replicating control RNA (mutatedGDD->GND motif in RNA polymerase).

FIG. 11. Quantitation of capsid protein synthesis by Western blot ofprotein extracted from cells and cell free supernatant at 18 hours afterinfection with wild type (WT) or CpG/UpA-low mutants of E7 (moderatecytopathic effect). Viral proteins were detected using a VP1-specificmonoclonal antibody (DAKO, Clone 5-D8/1) and levels compared bydensitometry (values shown above panels, standardised to wild-typelevels).

DETAILED DESCRIPTION

Materials and Methods Cell culture and cell lines E7 was propagated inrhabdomyosarcoma (RD) cells using Dulbecco modified Eagle medium (DMEM)with 10% foetal calf serum (FCS), penicillin (100 U/ml) and streptomycin(100 μg/ml). All cells were maintained at 37° C. with 5% CO2.

In Silico Design of CpG and UpA Modified Viruses.

Two regions of the full length E7 cDNA pT7: E7 clone were selected formutagenesis that lay in regions of the genome bounded by uniquerestriction sites Sall (genome position 1878) and Hpa\ (genome position31 19) for Region 1 and EcoRI (genome position 5403) and BglII (genomeposition 6462) for Region 2. To generate CpG-low mutants, all CpGdinucleotides were eliminated by replacement of either the C or the Gbase with a randomly alternative selected base selected to preservecoding of the underlying sequence. A similar strategy was used togenerate UpA-low mutants, with the restriction that UpAp(C or U) codonsencoding tyrosine precluded elimination of all UpA dinucleotides.Introduction of as many possible CpG or UpA dinucleotides whilepreserving coding was employed to generate CpG-high and UpA-HIGHsequences. The sequence changes and base compositions of the resultinginsert sequences are shown in Table 1.

TABLE 1 Composition of region 1 and 2 insert sequences G + C CpG UpARegion Sequence Content freq^(a) change^(b) ratio^(c) freq Change Ratio1 Native 47.6% 0.041 — 0.730 0.050 — 0.742 Permuted 47.6% 0.041 0 0.7300.050 0 0.742 CpG-low 44.3% 0 −51 0 0.057 +8 0.741 UpA-low 50.6% 0.045+5 0.703 0.015 −43 0.256 UpA/CpG-low 47.5% 0 −51 0 0.015 −43 0.227CpG-high 56.5% 0.146 +129 1.828 0.042 −10 0.900 UpA-high 40.9% 0.032 −120.756 0.139 +109 1.593 2 Native 47.6% 0.018 — 0.320 0.047 — 0.695Permuted 47.6% 0.018 0 0.320 0.047 0 0.695 CpG-low 44.3% 0 −18 0 0.047 —0.695 UpA-low 50.6% 0.021 +3 0.331 0.014 −34 0.229 UpA/CpG-low 47.5% 0−18 0 0.014 −34 0.215 CpG-high 56.5% 0.133 +116 1.667 0.037 −10 0.824UpA-high 40.9% 0.015 −3 0.390 0.149 +103 1.633 ^(a)Frequency ofdinucleotide in insert region ^(b)Change in the number of dinucleotides(CpG or UpA) between mutated and original WT sequence ^(c)Ratio ofobserved dinucleotide frequence to that expected based on mononucleotidecomposition i.e. f(CpG/f(C) * f(G)

The specific sequences of the wild type (WT), CpG-low, UpA-low and CpGand UpA-low for each of regions 1 and 2 were as follows:

Echovirus 7 WT Region 1 (SEQ ID NO 1)GUCGACUCCGUGGUGCCCGUCAACAAUAUCAAAGUCAACCUGCAAAGCAUGGAUGCGUAUCAUAUUGAGGUCAAUACCGGGAACCACCAGGGGGAAAAGAUUUUUGCGUUCCAAAUGCAGCCGGGGUUAGAGUCUGUUUUCAAGAGAACCCUUAUGGGGGAGAUUCUUAAUUAUUAUGCACACUGGUCAGGGAGCAUUAAGCUGACAUUCACAUUUUGUGGAUCGGCGAUGGCAACUGGAAAACUCUUGUUAGCGUAUUCACCACCAGGUGCUGAUGUGCCCGCGACCAGGAAACAGGCGAUGUUAGGCACACACAUGAUUUGGGAUAUCGGGCUUCAGUCGAGCUGUGUUUUGUGCAUCCCAUGGAUAAGUCAGACACACUACCGGUUAGUGCAACAAGAUGAAUACACGAGUGCAGGCAAUGUGACGUGUUGGUACCAAACAGGAAUAGUGGUGCCCCCUGGCACUCCAAAUAAGUGUGUAGUGCUUUGUUUUGCAUCAGCUUGUAAUGAUUUCUCAGUUCGAAUGCUUAGGGACACCCCUUUCAUCGGACAAACAGCACUGCUGCAAGGCGACACCGAAACGGCUAUUGACAAUGCAAUCGCCAGGGUAGCAGAUACGGUGGCGAGCGGUCCUAGUAAUUCGACCAGUAUCCCAGCACUCACAGCAGUUGAGACAGGUCACACGUCACAAGUCGAGCCCAGCGAUACAAUGCAGACUAGACAUGUCAAAAACUACCACUCGCGUUCUGAGUCAACCGUGGAAAACUUUCUAAGUCGCUCCGCUUGUGUGUACAUCGAAGAGUACUACACCAAGGACCAAGACAAUGUUAAUAGGUACAUGUCGUGGACAAUAAAUGCCAGAAGAAUGGUGCAAUUGAGGAGAAAGUUUGAGCUGUUUACAUACAUGAGAUUUGAUAUGGAAAUCACGUUUGUAAUCACAAGUAGACAACUACCUGGGACUAGCAUAGCACAAGAUAUGCCGCCACUCACCCACCAGAUCAUGUACAUACCACCAGGUGGCCCGGUACCAAACAGCGUAACAGAUUUUGCGUGGCAGACAUCAACAAACCCCAGCAUUUUCUGGACAGAAGGAAACGCGCCACCUCGCAUGUCUAUUCCAUUCAUCAGUAUUGGCAAUGCAUAUAGCAACUUCUAUGACGGGUGGUCACACUUUUCCCAAAACGGUGUGUACGGAUACAACGCCCUGAACAACAUGGGCAAGCUGUACGCACGUCAUGUUAAC Echovirus 7 WT Region 2(SEQ ID NO 2) GAAUUCGCCGUUGCUAUGAUGAAGAGAAACUCAAGUACAGUGAAGACUGAGUAUGGUGAGUUUACUAUGCUGGGCAUCUAUGACAAGUGGGCCGUUUUGCCACGCCAUGCUAAACCUGGACCAACCAUCCUGAUGAAUGACCAAGAGGUCGGCGUGUUAGACGCCAAGGAACUAGUGGACAAGGAUGGCACUAACCUGGAGCUGACACUACUCAAGUUAAACCGGAAUGAGAAGUUCAGAGACAUCAGAGGCUUCUUGGCUAAGGAGGAAGUGGAAGUCAACGAGGCUGUGCUGGCAAUAAACACUAGCAAGUUUCCUAACAUGUACAUUCCAGUAGGGCAGGUUACAGAUUACGGCUUCCUAAACCUGGGUGGUACACCCACCAAAAGAAUGCUUAUGUAUAACUUCCCCACAAGAGCAGGCCAGUGUGGCGGGGUACUCAUGUCCACUGGCAAAGUUUUGGGAAUCCAUGUUGGUGGAAAUGGCCAUCAAGGCUUCUCAGCAGCACUUCUCAAACACUACUUUAAUGAUGAACAAGGAGAGAUUGAGUUCAUUGAGAGUUCAAAGGAAGCAGGGUUCCCAAUCAUUAACGCACCCAGUAAAACCAAGCUGGAGCCAAGUGUCUUCCACCAAGUAUUUGAAGGCAACAAAGAGCCAGCAGUCCUCAGGAACAGUGACCCACGUCUCAAAGCUAAUUUCGAGGAGGCCAUCUUUUCCAAAUACAUUGGGAAUGUCAACACACACAUAGAUGAAUACAUGUUGGAGGCUGUUGACCAUUAUGCCGGACAAUUGGCCACCCUAGAUAUCAGCACUGAACCAAUGAAGUUGGAGGAUGCUGUGUACGGUACUGAAGGCCUUGAAGCUCUUGACUUAACAACAAGUGCAGGCUACCCCUAUGUCGCACUGGGUAUCAAGAAGAGAGACAUCCUCUCGAAGAAGACCAAGGACCUGACCAAGCUGAAAGAGUGCAUGGAUAAGUAUGGCCUGAAUCUACCAAUGGUGACAUACGUGAAAGAUGAACUCAGAUCU CpG-low Region 1 (SEQ ID NO 3)GUCGACUCAGUGGUGCCAGUCAACAAUAUCAAAGUCAACCUGCAAAGCAUGGAUGCUUAUCAUAUUGAGGUCAAUACAGGGAACCACCAGGGGGAAAAGAUUUCUGCUUUCCAAAUGCAGCCUGGGUUAGAGUCUGUUUUCAAGAGAACCCUUAUGGGGGAGAUUCUUAAUUAUUAUGCACACUGGUCAGGGAGCAUUAAGCUGACAUUCACAUUUUGUGGAUCUGCCAUGGCAACUGGAAAACUCUUGUUAGCUUAUUCACCACCAGGUGCUGAUGUGCCUGCAACCAGGAAACAGGCUAUGUUAGGCACACACAUGAUUUGGGAUAUAGGGCUUCAGUCCAGCUGUGUUUUGUGCAUCCCAUGGAUAAGUCAGACACACUACAGGUUAGUGCAACAAGAUGAAUACACAAGUGCAGGCAAUGUGACAUGUUGGUACCAAACAGGAAUAGUGGUGCCCCCUGGCACUCCAAAUAAGUGUGUAGUGCUUUGUUUUGCAUCAGCUUGUAAUGAUUUCUCAGUUAGGAUGCUUAGGGACACCCCUUUCAUAGGACAAACAGCACUGCUGCAAGGAGACACAGAAACAGCUAUUGACAAUGCAAUUGCCAGGGUAGCAGAUACUGUGGCAAGUGGUCCUAGUAAUUCAACCAGUAUCCCAGCACUCACAGCAGUUGAGACAGGUCACACCUCACAAGUGGAGCCCAGUGAUACAAUGCAGACUAGACAUGUCAAAAACUACCACUCUAGGUCUGAGUCAACUGUGGAAAACUUUCUAAGUAGGUCAGCUUGUGUGUACAUAGAAGAGUACUACACCAAGGACCAAGACAAUGUUAAUAGGUACAUGUCCUGGACAAUAAAUGCCAGAAGAAUGGUGCAAUUGAGGAGAAAGUUUGAGCUGUUUACAUACAUGAGAUUUGAUAUGGAAAUCACCUUUGUAAUCACAAGUAGACAACUACCUGGGACUAGCAUAGCACAAGAUAUGCCACCACUCACCCACCAGAUCAUGUACAUACCACCAGGUGGCCCAGUACCAAACAGUGUAACAGAUUUUGCCUGGCAGACAUCAACAAACCCCAGCAUUUUCUGGACAGAAGGAAAUGCCCCACCUAGGAUGUCUAUUCCAUUCAUCAGUAUUGGCAAUGCAUAUAGCAACUUCUAUGAUGGGUGGUCACACUUUUCCCAAAAUGGUGUGUAUGGAUACAAUGCCCUGAACAACAUGGGCAAGC UGUAUGCAAGACAUGUUAACCpG-low Region 2 (SEQ ID NO 4)GAAUUCGCUGUUGCUAUGAUGAAGAGAAACUCAAGUACAGUGAAGACUGAGUAUGGUGAGUUUACUAUGCUGGGCAUCUAUGACAAGUGGGCAGUUUUGCCAAGGCAUGCUAAACCUGGACCAACCAUCCUGAUGAAUGACCAAGAGGUUGGGGUGUUAGAUGCCAAGGAACUAGUGGACAAGGAUGGCACUAACCUGGAGCUGACACUACUCAAGUUAAACAGAAAUGAGAAGUUCAGAGACAUCAGAGGCUUCUUGGCUAAGGAGGAAGUGGAAGUCAAUGAGGCUGUGCUGGCAAUAAACACUAGCAAGUUUCCUAACAUGUACAUUCCAGUAGGGCAGGUUACAGAUUAUGGCUUCCUAAACCUGGGUGGUACACCCACCAAAAGAAUGCUUAUGUAUAACUUCCCCACAAGAGCAGGCCAGUGUGGAGGGGUACUCAUGUCCACUGGCAAAGUUUUGGGAAUCCAUGUUGGUGGAAAUGGCCAUCAAGGCUUCUCAGCAGCACUUCUCAAACACUACUUUAAUGAUGAACAAGGAGAGAUUGAGUUCAUUGAGAGUUCAAAGGAAGCAGGGUUCCCAAUCAUUAAUGCACCCAGUAAAACCAAGCUGGAGCCAAGUGUCUUCCACCAAGUAUUUGAAGGCAACAAAGAGCCAGCAGUCCUCAGGAACAGUGACCCAAGGCUCAAAGCUAAUUUUGAGGAGGCCAUCUUUUCCAAAUACAUUGGGAAUGUCAACACACACAUAGAUGAAUACAUGUUGGAGGCUGUUGACCAUUAUGCAGGACAAUUGGCCACCCUAGAUAUCAGCACUGAACCAAUGAAGUUGGAGGAUGCUGUGUAUGGUACUGAAGGCCUUGAAGCUCUUGACUUAACAACAAGUGCAGGCUACCCCUAUGUGGCACUGGGUAUCAAGAAGAGAGACAUCCUCUCAAAGAAGACCAAGGACCUGACCAAGCUGAAAGAGUGCAUGGAUAAGUAUGGCCUGAAUCUACCAAUGGUGACAUAUGUGAAAGAUGAACUCAGAUCU UpA-low Region 1 (SEQ ID NO 5)GUCGACUCCGUGGUGCCCGUCAACAACAUCAAAGUCAACCUGCAAAGCAUGGAUGCGUAUCACAUUGAGGUCAACACCGGGAACCACCAGGGGGAAAAGAUUUUUGCGUUCCAAAUGCAGCCGGGGUUGGAGUCUGUUUUCAAGAGAACCCUCAUGGGGGAGAUUCUCAAUUAUUAUGCACACUGGUCAGGGAGCAUCAAGCUGACAUUCACAUUUUGUGGAUCGGCGAUGGCAACUGGAAAACUCUUGUUGGCGUAUUCACCACCAGGUGCUGAUGUGCCCGCGACCAGGAAACAGGCGAUGUUGGGCACACACAUGAUUUGGGACAUCGGGCUUCAGUCGAGCUGUGUUUUGUGCAUCCCAUGGAUCAGUCAGACACACUACCGGUUGGUGCAACAAGAUGAAUACACGAGUGCAGGCAAUGUGACGUGUUGGUACCAAACAGGAAUUGUGGUGCCCCCUGGCACUCCAAACAAGUGUGUCGUGCUUUGUUUUGCAUCAGCUUGCAAUGAUUUCUCAGUUCGAAUGCUGAGGGACACCCCUUUCAUCGGACAAACAGCACUGCUGCAAGGCGACACCGAAACGGCGAUUGACAAUGCAAUCGCCAGGGUUGCAGACACGGUGGCGAGCGGUCCGAGCAAUUCGACCAGCAUCCCAGCACUCACAGCAGUUGAGACAGGUCACACGUCACAAGUCGAGCCCAGCGACACAAUGCAGACCAGACAUGUCAAAAACUACCACUCGCGUUCUGAGUCAACCGUGGAAAACUUUCUCAGUCGCUCCGCUUGUGUGUACAUCGAAGAGUACUACACCAAGGACCAAGACAAUGUCAACAGGUACAUGUCGUGGACAAUCAAUGCCAGAAGAAUGGUGCAAUUGAGGAGAAAGUUUGAGCUGUUCACAUACAUGAGAUUUGACAUGGAAAUCACGUUUGUCAUCACAAGCAGACAACUUCCUGGGACGAGCAUCGCACAAGACAUGCCGCCACUCACCCACCAGAUCAUGUACAUCCCACCAGGUGGCCCGGUCCCAAACAGCGUCACAGAUUUUGCGUGGCAGACAUCAACAAACCCCAGCAUUUUCUGGACAGAAGGAAACGCGCCACCUCGCAUGUCCAUUCCAUUCAUCAGCAUUGGCAAUGCAUACAGCAACUUCUAUGACGGGUGGUCACACUUUUCCCAAAACGGUGUGUACGGAUACAACGCCCUGAACAACAUGGGCAAGC UGUACGCACGUCAUGUUAACUpA-low Region 2 (SEQ ID NO 6)GAAUUCGCCGUUGCCAUGAUGAAGAGAAACUCAAGCACAGUGAAGACUGAGUAUGGUGAGUUCACGAUCCUGGGCAUCUAUGACAAGUGGGCCGUUUUGCCACGCCAUGCCAAACCUGGACCAACCAUCCUGAUGAAUGACCAAGAGGUCGGCGUGUUGGACGCCAAGGAACUGGUGGACAAGGAUGGCACAAACCUGGAGCUGACACUCCUCAAGUUGAACCGGAAUGAGAAGUUCAGAGACAUCAGAGGCUUCUUGGCGAAGGAGGAAGUGGAAGUCAACGAGGCUGUGCUGGCAAUCAACACCAGCAAGUUUCCAAACAUGUACAUUCCAGUUGGGCAGGUCACAGAUUACGGCUUCCUGAACCUGGGUGGGACACCCACCAAAAGAAUGCUCAUGUACAACUUCCCCACAAGAGCAGGCCAGUGUGGCGGGGUGCUCAUGUCCACUGGCAAAGUUUUGGGAAUCCAUGUUGGUGGAAAUGGCCAUCAAGGCUUCUCAGCAGCACUUCUCAAACACUACUUCAAUGAUGAACAAGGAGAGAUUGAGUUCAUUGAGAGUUCAAAGGAAGCAGGGUUCCCAAUCAUCAACGCACCCAGCAAAACCAAGCUGGAGCCAAGUGUCUUCCACCAAGUGUUUGAAGGCAACAAAGAGCCAGCAGUCCUCAGGAACAGUGACCCACGUCUCAAAGCCAAUUUCGAGGAGGCCAUCUUUUCCAAAUACAUUGGGAAUGUCAACACACACAUCGAUGAAUACAUGUUGGAGGCUGUUGACCAUUAUGCCGGACAAUUGGCCACCCUUGACAUCAGCACUGAACCAAUGAAGUUGGAGGAUGCUGUGUACGGCACUGAAGGCCUUGAAGCUCUUGACUUGACAACAAGUGCAGGCUACCCCUAUGUCGCACUGGGGAUCAAGAAGAGAGACAUCUUCUCGAAGAAGACCAAGGACCUGACCAAGCUGAAAGAGUGCAUGGACAAGUAUGGCCUGAAUCUUCCAAUGGUGACAUACGUGAAAGAUGAACUCAGAUCU CpG & UpA-low Region 1 (SEQ ID NO 7)GUCGACUCAGUGGUGCCAGUCAACAACAUCAAAGUCAACCUGCAAAGCAUGGAUGCUUAUCACAUUGAGGUCAACACAGGGAACCACCAGGGGGAAAAGAUUUUUGCUUUCCAAAUGCAGCCUGGGUUGGAGUCUGUUUUCAAGAGAACCCUGAUGGGGGAGAUUCUGAAUUAUUAUGCACACUGGUCAGGGAGCAUCAAGCUGACAUUCACAUUUUGUGGAUCUGCCAUGGCAACUGGAAAACUCUUGUUGGCUUAUUCACCACCAGGUGCUGAUGUGCCUGCAACCAGGAAACAGGCCAUGUUGGGCACACACAUGAUUUGGGACAUUGGGCUUCAGUCCAGCUGUGUUUUGUGCAUCCCAUGGAUCAGUCAGACACACUACAGGUUGGUGCAACAAGAUGAAUACACAAGUGCAGGCAAUGUGACAUGUUGGUACCAAACAGGAAUUGUGGUGCCCCCUGGCACUCCAAACAAGUGUGUUGUGCUUUGUUUUGCAUCAGCUUGCAAUGAUUUCUCAGUCAGGAUGCUCAGGGACACCCCUUUCAUUGGACAAACAGCACUGCUGCAAGGAGACACAGAAACAGCCAUUGACAAUGCAAUUGCCAGGGUUGCAGACACUGUGGCAAGUGGUCCAAGCAAUUCAACCAGCAUCCCAGCACUCACAGCAGUUGAGACAGGUCACACCUCACAAGUGGAGCCCAGUGACACAAUGCAGACAAGACAUGUCAAAAACUACCACUCCAGGUCUGAGUCAACUGUGGAAAACUUUCUCAGCAGGUCAGCUUGUGUGUACAUUGAAGAGUACUACACCAAGGACCAAGACAAUGUCAACAGGUACAUGUCCUGGACAAUCAAUGCCAGAAGAAUGGUGCAAUUGAGGAGAAAGUUUGAGCUGUUCACAUACAUGAGAUUUGACAUGGAAAUCACCUUCGUGAUCACAAGCAGACAACUCCCUGGGACAAGCAUUGCACAAGACAUGCCACCACUCACCCACCAGAUCAUGUACAUUCCACCAGGUGGCCCAGUGCCAAACAGUGUCACAGAUUUUGCCUGGCAGACAUCAACAAACCCCAGCAUUUUCUGGACAGAAGGAAAUGCCCCACCAAGGAUGUCCAUUCCAUUCAUCAGCAUUGGCAAUGCAUACAGCAACUUCUAUGAUGGGUGGUCACACUUUUCCCAAAAUGGUGUGUAUGGAUACAAUGCCCUGAACAACAUGGGCAAGCUGUAUGCAAGACAUGUUAAC CpG & UpA-low Region 2(SEQ ID NO 8) GAAUUCGCUGUUGCCAUGAUGAAGAGAAACUCAAGCACAGUGAAGACUGAGUAUGGUGAGUUCACCAUGCUGGGCAUCUAUGACAAGUGGGCAGUUUUGCCAAGGCAUGCCAAACCUGGACCAACCAUCCUGAUGAAUGACCAAGAGGUUGGGGUGUUGGAUGCCAAGGAACUGGUGGACAAGGAUGGCACCAACCUGGAGCUGACACUUCUCAAGUUGAACAGAAAUGAGAAGUUCAGAGACAUCAGAGGCUUCUUGGCCAAGGAGGAAGUGGAAGUCAAUGAGGCUGUGCUGGCAAUCAACACCAGCAAGUUUCCCAACAUGUACAUUCCAGUGGGGCAGGUGACAGAUUAUGGCUUCCUGAACCUGGGUGGAACACCCACCAAAAGAAUGCUCAUGUACAACUUCCCCACAAGAGCAGGCCAGUGUGGAGGGGUUCUCAUGUCCACUGGCAAAGUUUUGGGAAUCCAUGUUGGUGGAAAUGGCCAUCAAGGCUUCUCAGCAGCACUUCUCAAACACUACUUCAAUGAUGAACAAGGAGAGAUUGAGUUCAUUGAGAGUUCAAAGGAAGCAGGGUUCCCAAUCAUCAAUGCACCCAGCAAAACCAAGCUGGAGCCAAGUGUCUUCCACCAAGUGUUUGAAGGCAACAAAGAGCCAGCAGUCCUCAGGAACAGUGACCCAAGGCUCAAAGCCAAUUUUGAGGAGGCCAUCUUUUCCAAAUACAUUGGGAAUGUCAACACACACAUUGAUGAAUACAUGUUGGAGGCUGUUGACCAUUAUGCAGGACAAUUGGCCACCCUGGACAUCAGCACUGAACCAAUGAAGUUGGAGGAUGCUGUGUAUGGCACUGAAGGCCUUGAAGCUCUUGACUUGACAACAAGUGCAGGCUACCCCUAUGUGGCACUGGGGAUCAAGAAGAGAGACAUCCUCUCAAAGAAGACCAAGGACCUGACCAAGCUGAAAGAGUGCAUGGACAAGUAUGGCCUGAAUCUCCCAAUGGUGACAUAUGUGAAAGAUGAACUCAGAUCURNA structure prediction and sequence variability.

Prototype sequences of each species B serotype www.picornaviridae.comwere scanned for RNA secondary structure using the program FoldingEnergy Scan in the SSE package (Simmonds. 2012, BMC research notes 5:50-50) using 200 base fragments incrementing by 152 bases and 50sequence order randomised control using the algorithm NDR that preservesdinucleotide frequencies of the native sequence (Simmonds et al. 2004,RNA-Publ. RNA Soc. 10: 1337-1351). Mean MFED values for each fragmentwere plotted against the mid-point of each fragment to localise areas ofsequence-order dependent RNA secondary structure. MFEDs were alsosimilarly calculated for the reverse complement of each genome sequence.Synonymous sequence variability was determined by measurement of meanpairwise distances using the program Sequence Scan in the SSE package.

Clone Construction and Recovery of Mutant Viruses

The full length E7 cDNA pT7: E7 clone under the control of a T7 promoterwas used for this study. Mutant E7 constructs with altered CpG/UpAcontent were generated by ordering custom DNA sequences (GeneArt, LifeTechnologies, Paisley, UK). Sequences were provided in standardantibiotic resistant cloning vectors and were cloned into pT7:E7 Allclones were sequenced over the insert regions prior to furtherapplications. To recover the mutant viruses with altered CpG/UpAcontent, assembled plasmids were linearised using Not\ and a T7transcription reaction carried out to create RNA using a Mega Script T7in vitro transcription kit (Ambion). 100 ng of RNA was transfected intoRD cells using Lipofectamine 2000 (Invitrogen) according to themanufacturer's instructions. The resulting cell lysates were used togenerate passage 1 stocks by re-infecting RD cells. Viral titres weredetermined by TCID₅₀ titration in RD cells.

Replication Phenotype

RD cells were seeded at 5×10⁵ cells per well in 6-well plates andsubsequently infected with the WT or CpG/UpA mutants at an MOI of 0.01per cell for 1 hour, before removing the inoculum and washing the cells.Samples were then withdrawn at given time points (12, 18, 24, 30, 42hours post-infection) and the viral titre determined by TCI D₅₀. Theassay was performed in triplicate per virus. For plaque assays,confluent RD cells in 100 mm dishes or 6-well plates were inoculatedwith virus in DM EM and incubated for 1 hour at 37° C. with occasionalrocking. The inoculum was removed and replaced with overlay consistingof 2% Methocel MC (Sigma) in DMEM. Plates were incubated for 96 hours at37° C., fixed with 3.5% formaldehyde and stained with 0.1% crystalviolet. Plaque sizes were quantified using ImageJ software.

Quantification of Viral RNA in Infected Cells

Load of viral RNA in infected RD cells was analysed using qRT-PCR. RNAwas isolated from cells using the RNAspin Mini Kit (GE Healthcare) orfrom viral supernatant using the QlAamp Viral RNA Mini Kit (Qiagen).Reverse transcription was performed using M-MLV reverse transcriptase(Promega) and random primers. E7 cDNA was then quantified by qRT-PCRusing primers annealing to the 5′ UTR region (Sense:TCCGGCCCCTGAATGCGGCTAA (SEQ ID NO 9), Antisense: CACCCAAAGTAGTCGGTTCCGC(SEQ ID NO 10)). Reactions were carried out using a Sensifast SYBRMi-Rox Kit (Bioline) and a Rotorgene-Q cycler (Qiagen), and cyclingconditions were as follows: 95° C. for 2 minutes, then 40 cycles of 95°C. for 5 seconds, 60° C. for 10 seconds and 72° C. for 20 seconds. Astandard curve for E7 RNA using a quantified PCR product was carried outin parallel, allowing quantification of viral copy number. RNA toinfectivity ratio was determined by extracting RNA from 5000 TCID₅₀units per virus and by performing quantitative RT-PCR against a standardcurve.

Replicon Construction and Replication Kinetics

To accurately quantify intracellular viral replication, the pRiboE7lucreplicon plasmid was used. This contains a version of the E7 genome inwhich the structural genes (nucleotides 753 to 31 18) are replaced withthe 1704bp-long firefly luciferase gene. In order to minimisefrequencies of CpG and UpA dinucleotides within the luciferase gene, analternative luciferase gene was designed using the same method as thatdescribed for Regions 1 and 2, and ordered as a custom DNA sequence. Asbefore, the amino acid sequence remained unchanged. The customluciferase gene also contained a CpG- and UpA-low 72 bp linker sequenceat the 3′ end to allow cloning into the SanDI restriction site atnucleotide 3191 of the E7 genome. The sequence was cloned into pT7:E7using the unique restriction sites KasI (genome position 781) and SanDI.To create replicons containing the additional Region 2 CpG or UpA lowinserts, a 3235bp section of the replicon directly 3′ of the luciferasegene was excised using SanDI and BglII restriction enzymes. This wasthen replaced with the equivalent sections of the previously describedR1/R2 CpG low or R1/R2 UpA low constructs, containing the modifiedRegion 2 inserts. Replicon plasmids were linearised using NotI and RNAwas created in a T7 reverse transcription reaction.

Assays were performed by transfecting 50 ng of replicon RNA into RDcells seeded at 3×10⁴ cells per well in 96-well plates. RNA wastransfected at given time points (1 , 4, 6, 8, 12 hours) beforeluciferase assays were carried out using the Luciferase Assay System(Promega), according to the manufacturer's instructions. Cells werelysed using the Passive Lysis Buffer and the cell lysate transferred toopaque 96-well plates for luminescence analysis using the Glomax MultiDetection System (Promega).

Sequencing of Individual Virus Genomes

Viral RNA was isolated from E7 WT, R1/R2 CpG-high, or R1/R2 UpA-highvirus stocks generated in RD cells, and cDNA created. Nested primerswere designed to amplify a ˜500 bp section of the modified Region 1(nucleotides 1835-2363) and an unmodified region of E7 (nucleotides3241-3723). Primer sequences are given in Table 2. The proofreadingenzyme PfuTurbo DNA Polymerase (Agilent)) was used to amplify the twosections from each cDNA. The products were purified, cloned into a TAvector (pGEM-T easy, Promega), and transformed into competent E. coli,generating a separate colony for each copy of the original viral cDNA.The 500 bp inserts were sequenced using M 13 primers.

TABLE 2 Nested primers used in sequencing individual viral genomesPrimer Nucleotide Region Virus type position Sequence 1 All Outer, 1809CCCAATTTGATGTAACACCACACATGG sense SEQ ID NO 11 1 All Inner, 1835GATATTCCAGGCGAAGTACACAACC sense SEQ ID NO 12 1 EV7 WT Outer, 2343CAAAGCACTACACACTTATTTGGAG antisense SEQ ID NO 13 1 R1/R2 Outer, 2382ATTCGAACGGAGAAATCGTTAC CpG-high antisense SEQ ID NO 14 1 R1/R2 Outer,2388 TCCCTTAGCATACGTACTGAGAAAT UpA-high antisense SEQ ID NO 15 1 EV7 WTInner, 2313 GCACCACTATTCCTGTTTGGT antisense SEQ ID NO 16 1 R1/R2 Inner,2348 AACAAAGCACGACGCACTTATT CpG-high antisense SEQ ID NO 17 1 R1/R2Inner, 2363 CATTACAAGCTGATCCAAAACATAG UpA-high antisense SEQ ID NO 18Un- All Outer, 3210 TGAGCCCGTACATCAAATCA modified sense SEQ ID NO 19 Un-All Inner, 3241 TTTTAACCCCACGAACCTGA modified sense SEQ ID NO 20 Un- AllOuter, 3785 TTGCCGAGTTGTTCGACATA modified antisense SEQ ID NO 21 Un- AllInner, 3723 CAAGTCACGGATGTCTGCAA modified antisense SEQ ID NO 22

Competition Assays

Equal titres of wild type (WT) and mutant virus (MOI=0.01) were appliedsimultaneously to RD cells in 24-well plates. Following CPE, thesupernatant was frozen, thawed, and applied to fresh RD cells. This wascontinued for 10 passages, and was carried out in triplicate for eachassay. For the pairwise competition assay, RD cells were inoculated withpaired combinations of 7 viruses, giving 21 combinations in total. Eachpairwise assay was carried out in a single well and passaged through RDcells 10 times. RNA was isolated from the final supematants, cDNA wasgenerated and nested PCR carried out to amplify either Region 1 orRegion 2 (Primers used are as follows:

Region 1 sense (outer): (SEQ ID NO 23) CCCAATTTGATGTAA CACCACACATGG,Region 1 sense (inner): (SEQ ID NO 24) GATATTCCAGGCGAAGTACACAACC,Region 1 antisense (outer): (SEQ ID NO 25) CCCATACTCGGATGTGCTTGGG,Region 1 antisense (inner): (SEQ ID NO 26) CACTCGGATTGTGCTTGACATCTG,Region 2 sense (outer): (SEQ ID NO 27) CAAGGAGCATACACAGGA ATA CC,Region 2 sense (inner): (SEQ ID NO 28) GGTACCTACTCTTAGGCAAGCA,Region 2 antisense (outer): (SEQ ID NO 29) GAATGTCTGCCTCATCGCCAACT,Region 2 antisense (inner): (SEQ ID NO 30)) AAGCTGGACGCTTCAATGAGCCT.The amplified fragment was then subjected to selective digest todetermine the composition of each virus in the final supernatant. Therestriction enzymes used for each competition assay are given in Table3. Relative band intensity was measured using ImageJ software.

TABLE 3 Enzymes used in selective digests for competition assays RegionVirus 1 Virus 2 amplified Enzyme Restriction site Individual competetionexperiments WT R1/R2 Permuted 2 HindIII In R1/R2 Permuted WT R1/R2CpG-high 1 BamHI In R1/R2 CpG-high WT R1/R2 UpA-high 2 ScaI In R1/R2UpA-high WT R1/R2 CpG-low 2 SphI In R1/R2 CpG-low WT R1/R2 UpA-low 2EcoRV In WT Pairwise competition experiments WT R1/R2 Permuted 2 HindIIIIn R1/R2 Permuted WT R1 CpG/UpA-low 1 EcoRV In WT WT R2 CpG/UpA-low 2EcoRV In WT WT R1/R2 CpG-low 2 EcoRV In R1/R2 CpG-low WT R1/R2 UpA-low 2EcoRV In WT WT R1/R2 CpG/UpA-low 2 SphI In WT R1/R2 Permuted R1CpG/UpA-low 1 SphI In R1/R2 Permuted R1/R2 Permuted R2 CpG/UpA-low 2HindIII In R1/R2 Permuted R1/R2 Permuted R1/R2 CpG-low 2 HindIII InR1/R2 Permuted R1/R2 Permuted R1/R2 UpA-low 2 HindIII In R1/R2 PermutedR1/R2 Permuted R1/R2 CpG/UpA-low 2 HindIII In R1/R2 Permuted R1CpG/UpA-low R2 CpG/UpA-low 1 EcoRV R2 CpG/UpA-low R1 CpG/UpA-low R1/R2CpG-low 2 SphI In R1/R2 CpG-low R1 CpG/UpA-low R1/R2 UpA-low 2 EcoRV InR1 CpG/UpA-low R1 CpG/UpA-low R1/R2 CpG/UpA-low 2 EcoRV In R1CpG/UpA-low R2 CpG/UpA-low R1/R2 CpG-low 2 EcoRV In R1/R2 CpG-low R2CpG/UpA-low R1/R2 UpA-low 2 SphI In R2 CpG/UpA-low R2 CpG/UpA-low R1/R2CpG/UpA-low 1 EcoRV In R2 CpG/UpA-low R1/R2 CpG-low R1/R2 UpA-low 2EcoRV In R1/R2 CpG-low R1/R2 CpG-low R1/R2 CpG/UpA-low 2 EcoRV In R1/R2CpG-low R1/R2 UpA-low R1/R2 CpG/UpA-low 2 SphI In R1/R2 CpG/UpA-low

Early Intra-Cellular Replication Kinetics

To induce synchronous infection, RD cells in 24-well plates werecold-treated at 4° C. for 5 minutes before inoculation with wild type ormutant virus normalised for genome copy number. A total of 2×10⁸ genomecopies (1000 per cell) were applied to each well, and the cells weremaintained at 4° C. for a further 30 minutes before being moved to 37°C. Cells were washed twice with PBS and then trypsinised 1 hour or 4hours post infection. The cells were then pelleted and washed again inPBS before RNA was isolated and viral copy number determined by qRT-PCR.Copy number was normalised against the housekeeping gene GAPDH (qRT-PCRprimers: Sense GAAATCCCATCACCATCTTCCAGG (SEQ ID NO 31); AntisenseGAGCCCCAGCCTTCTCCATG (SEQ ID NO 32)).

R1 Transfection—Creating the Transcripts

RNA transcripts were made from Region 1 of the E7 WT and mutant virusesby linearising the original cloning plasmid containing the syntheticinsert with HpaI, and carrying out a T7 transcription reaction. Theintegrity of the 1.3 kb RNA transcripts was confirmed using an AgilentBioanalyser. A549 cells in 24-plates were transfected with 250 μI RNAusing 1.5 μI Lipofectamine 2000 (Invitrogen) per well, and cellular RNAwas harvested 6 hours later. Poly 1:C (5 pg/well) was transfected as apositive control. Induction of I FNp was analysed by qRT-PCR (Primers:Sense GACCAACAAGTGTCTCCTCCAAA (SEQ I D NO 33); antisense G AACTGCTGCAGCTG CTTAATC (SEQ ID NO 34)) using cycling conditions of 95° C. for10 mins, followed by 40 cycles of 95° C. for 15 s and 60° C. for 60 s.Copy number was normalised against GAPDH

Results Strategy for Maximising or Minimising CpG/UpA Content in MutantViruses

Like other small RNA viruses, the frequency of CpG dinucleotides in theE7 genome was suppressed relative to the expected frequency based on itsG+C content, with an observed to expected ratio of CpG dinucleotides inthe coding sequence of E7 of 0.58. Frequencies of UpA dinucleotides werealso suppressed in the E7 genome (observed to expected ratio of 0.78).

To investigate whether CpG and UpA dinucleotide frequencies influencedthe ability of E7 to replicate in vitro, we created a series of mutatedviruses in which frequencies of both nucleotides were changed from theirnative levels. This was achieved using the reverse genetics systemdeveloped for enteroviruses, in the current study with the pT7:E7infectious clone. RNA transcripts generated from a linearised plasmidcontaining the E7 complete genome sequence generate infectious virus forphenotypic characterisation after transfection into a wide range ofmammalian cells.

To select sequences for mutagenesis, we sought to avoid regions of thegenome that contained RNA elements required for replication ortranslation functions of the virus, such as the cis-replicating elementembedded in the 2C coding sequence (Goodfellow et al. 2000, Journal ofVirology 74: 4590-4600).

Although incompletely located and functionally characterised to date,the presence of required non-coding elements can be revealed throughanalysis of RNA secondary structure formation in these regions andthrough suppression of synonymous sequence variability that reflectsnon-coding functional constraints on sequence change in these regions(FIG. 8). By scanning an alignment of complete genome sequences of eachof the current described species B serotypes (including the pT7:E7sequence of the infectious clone), an area of marked suppression ofsequence variability co-localised in the 2C region with the CRE.Calculation of folding energies to detected RNA secondary structure inthe genome showed prominent regions of structure in the 5′UTR, 3′UTR andthe CRE. The remainder of the genome showed no evidence for consistentRNA structure formation (MFED values around zero).

The combination of unrestricted synonymous variability and an absence ofRNA secondary structure over long stretches of the E7 genome providedopportunities for altering dinucleotide frequencies without impairingvirus replication for other reasons. Two genome regions (at positions1878-31 19 and 5403-6462) were selected for mutagenesis based on thesecriteria. Sequences were modified by replacing nucleotides within CpG orUpA dinucleotides with alternative bases that preserved coding. It waspossible to remove all CpG dinucleotides from both regions and reduceUpA to frequencies approximately one third of wild type levels (Table 1;CpG-low and UpA-low insert sequences). As an alternative strategy tomaximise frequencies of these dinucleotides, every site that couldtolerate the creation of these dinucleotides without changing coding wasidentified and mutated to create sequences with 2.5×3× the theirnaturally occurring frequencies (Table 1 ; CpG-high, UpA-high). Toensure that sequence disruption did not damage or destroy undetectedreplication element within Region 1 and 2, sequences from these regionswere permuted using the algorithm CDLR in the SSE sequence package(E7-permuted in Table 1). This randomises the order of codons within thesequence while maintaining coding and dinucleotide frequencies throughswaps between equivalently coding triplets in the same upstream anddownstream dinucleotide contexts. All insert sequences were thensynthesised and cloned into the pT7: E7 infectious clone using naturallyoccurring restriction sites. Clones were creating with one or bothregions replaced by modified insert sequences.

Replicative Fitness of Mutants with Modified CpG/UpA Frequencies

Wild type E7 and mutant viruses were recovered in tissue culture bytransfecting whole-genome RNA sequences obtained through T7transcription of pT7: E7. Recovered virus was then titred by TCID50 andused in subsequent experiments.

Particle to infectivity ratio. RNA copy to infectivity ratios weredetermined by extracting viral RNA from a known infectious titre of eachvirus, and carrying out qRT-PCR. The ratios are shown in FIG. 1. The RNAto infectivity ratio of the permuted double region mutant (247±9.2) wassimilar to that of the WT E7 virus (354±8.0), indicating that theprocess of synonymous nucleotide replacement itself does not affect RNAto infectivity ratio where dinucleotide frequencies are kept constant.In contrast, increasing the either the CpG or UpA dinucleotide frequencydrastically affected RNA to infectivity ratio, with the value for thedouble region CpG-high mutant being approximately 350 times the WT value(128,840±31698.6) and the UpA-high mutant approximately 20 times higher(6233±883.6). The RNA to infectivity ratio for the double region CpG-lowand UpA-low mutants was comparable to the WT.

ii) Replication kinetics with low MOI infection. In a low-MOI multi-stepinfection the growth kinetics of the E7 mutants was compared to that ofthe WT. Increasing the CpG or UpA dinucleotide frequency caused a severeattenuation of viral replication, resulting in a viral output 6854-foldlower in the R1/R2 CpG-high than the WT after 24 hours, and a 30-foldlower output in the R1/R2 UpA-high mutant (FIG. 2a ). Mutant virusesreplicated more slowly as well as producing a lower final output ofinfectious particles. Dose-dependency was demonstrated, as viruses withonly one region mutated tended to replicate better than the doubleregion mutants. Increasing dinucleotide frequencies in Region 2 was moredetrimental to viral replication than Region 1, despite its shorterlength (1 kb compared to 1.3 kb). R1 CpG-high mutants replicated only144-fold less than wild type at 24 hours, whilst R2 CpG-high mutantsreplicated 1487-fold less (FIG. 2c ). Amongst the UpA-high mutants,replication was actually improved by modifying R1, giving a 10-foldhigher output than wild type, whilst the R2 mutant fared slightly worsethan the double mutant. The replication rate of the R1/R2 permutedcontrol was indistinguishable from wild type. Lowering the CpG and UpAdinucleotide frequency compared to the WT level actually had a positiveeffect on viral replication, albeit more subtle than for the highmutants (FIG. 2b , c). The replication rates and final viral outputs ofthe CpG-low and UpA-low double mutants were similar to wild type,however the replication of the R1/R2 CpG/UpA-low double mutant was10-fold higher than the wild type at both 18 and 24 hours postinfection.

iii) Plaque morphology. Increasing CpG and UpA frequency also negativelyaffected plaque area (FIGS. 3a-b ). The size reduction in CpG-highmutants was dose-dependent, with the R1 mutant plaque area 3.5-foldlower than the E7 WT and the R1/R2 mutant 8.8-fold lower. R1/R2 UpA-highplaques were on average 3.2-fold smaller than the wild type,demonstrating again a less severe phenotype than the equivalent doubleregion CpG-high mutant. The area of R1/R2 UpA-low plaques was comparableto WT, whilst the R1/R2 CpG-low mutant produced significantly largerplaques, 1.4-fold greater than the WT.

iv) Replication kinetics of a sub-genomic replicon. The replicationkinetics of CpG- and UpA-low mutants were further characterised using asub-genomic replicon system expressing a luciferase gene, in order toprovide a more sensitive measure of viral genome replication.Bioinformatic analysis of the original pRiboE7luc 1.7 kb fireflyluciferase gene revealed a strikingly high observed to expected CpGratio, of 1.242. This is characteristic of insect genomes, in which CpGfrequency is not suppressed (Burge et al. 1992, Proceedings of theNational Academy of Sciences of the United States of America 89:1358-1362). Despite the widespread use of such reporter systems, theresults obtained in the current study and those of Burns et al. (2009)suggested that the high CpG ratio could drastically impede thereplication rate of this viral replicon in mammalian cell lines. Areplacement luciferase gene was therefore designed in which the CpGratio was reduced to 0.013 and the UpA ratio to 0.145 (from 0.699)through synonymous substitution, as described previously. Followingthis, Region 2 of the resulting modified replicon was replaced with theCpG-low or UpA-low inserts used in generating the original double regionmutants. Fluorescence was then analysed over a 12-hour time-coursefollowing transfection of each replicon (FIG. 5). A dramatic increase inreplicative ability was conferred by the replacement of the originalinsect luciferase gene with the synthetic CpG/UpA low gene, giving a100-fold difference in relative luminescence at 4 hours. Replicationrate was heightened further by the addition of the Region 2 CpG- orUpA-low inserts, to a maximum of 6-fold after 6 (CpG-low) or 4 (UpA-low)hours relative to the pRiboE7luc CpG/UpA low replicon. The resultsdemonstrate that by reducing CpG or UpA frequencies to below wild typelevels, replicative fitness of E7 can actually be improved in a cellculture environment. Furthermore, the efficiency of transgenic reportergenes may be improved by at least 100-fold by optimising CpG and UpAfrequencies according to the genetic system under study.

Investigation of Virus Particle Integrity

In order to determine whether the impaired replication rate observed inCpG and UpA-high mutants was due to a reduction in the ability of virusparticles to enter cells, a comparison was made between the number ofvirus particles used to infect cells and the number of intracellularvial genome copies present immediately post infection. One hour after asynchronous infection with 1000 virus particles per cell (as determinedby qRT-PCR), the number of intracellular viral genome copies was foundto be similar between viruses, with 42 per cell in wild type E7, 19 percell in R1/R2 CpG-high, and 36 per cell in R1/R2 UpA-high, see FIG. 4.Four hours post infection, after initiation of viral genome replication,a clear differentiation was observed between viruses. The number of wildtype genome copies had increased to 2362 per cell, whilst CpG-highcopies remained at 58 per cell and UpA-high at 207 per cell. IncreasingCpG or UpA dinucleotide frequencies therefore affects viral genomereplication at an early stage post infection.

Fitness Comparison of Modified Viruses using Competition Assays

The relative fitness of high and low mutant viruses compared to E7 WTwas confirmed using competition assays. Following infection with anequal MOI of each virus and serial passage in tissue culture, R1/R2CpG-high and R1/R2 UpA-high each became rapidly out-competed by the WT,being un-detectable by PCR after 5 passages (FIG. 6). Further analysisof CpG-high mutants showed that the R1/R2 mutant was already beingout-competed after 1 passage, whereas the R1 and R2 mutants wereout-competed more slowly due to their higher relative fitness.Similarly, the individual R1 and R2 UpA-high mutants were still abundantafter 5 passages.

Confirming the replicative advantage revealed by the CpG-low and UpA-lowreplicons, the R1/R2 CpG- and UpA-low mutants demonstrated a higherrelative fitness than WT, out-competing it completely after 15 passages,and showing at least 90% prevalence after only 10 passages (FIG. 6). Toinvestigate this phenomenon further, a pairwise competition experimentwas carried out whereby combinations of single or double region CpG-and/or UpA-low mutants were competed against one another, allowing afitness ranking to be determined (FIG. 7a -b). The R1/R2 CpG/UpA-lowmutant had the highest fitness, completely out-competing almost all ofthe other viruses by passage 6. The double region CpG-low ranked second,followed by the single region R1 CpG/UpA-low mutant. Lowering CpG/UpAfrequency in Region 1 was demonstrated to have more effect than in R2,as the R2 mutant was rapidly out-competed by R1 CpG/UpA-low as well asthe double region UpA-low mutant, an effect that might be expected dueto the relative sizes of the modified fragments (Region 1 is 1.3 kbwhereas R2 is 1 kb). The reduction of CpG frequency was shown to have agreater effect than that of UpA, whereas increasing the CpG level wasmore detrimental to viral replication than UpA.

Effect of Dinucleotide Frequency Changes in other RNA Viruses

To investigate the generality of the replication enhancement observed inE7 in other virus systems, the inventor constructed mutants of themurine Theiler's virus (TMEV), a picornavirus in the genus Cardiovirusand of influenza A virus (IAV) with regions of the genome replaced withmodified coding sequences. These were similarly designed to containelevated or lowered CpG and UpA dinucleotide frequencies while retainingprotein and avoiding areas of the genome containing known or suspectedRNA secondary structures or packaging elements (IAV).

Replication competent mutant of TMEV was constructed with a region ofthe genome between positions 5445-6702 replaced with modified sequences(numbering based on the TMEV GD7 clone [accession number X56019]).Mutants with elevated frequencies of CpG and UpA showed substantialimpairment of virus replication (FIG. 9) while the CpG/UpA-low mutantshowed enhanced replication compared to wild type (WT) virus. CpG- andUpA-high mutants showed elevated RNA/infectivity ratios compared to WT.The degree of replication enhancement/attenuation observed in TMEV wassimilar in extent to those of E7 mutants with comparable degrees ofgenome replacement (single region mutants).

Several mutants of IAV have been constructed in which one or more genomesegments were replaced with modified insert sequences. As an example ofthe results obtained, mutants with a segment with increased CpG or UpAshowed attenuated replication and an increased RNA/infectivity ratio.These changes in phenotype were comparable in magnitude to thoseobserved in E7 (and TM EV). The replication cycle of IAV issubstantially different from those of E7 and TMEV and indicates that therestrictions imposed by possession of CpG and UpA dinucleotides onreplication/gene expression likely represent fundamental aspects of RNAvirus replication. Dinucleotide frequencies therefore may influencereplication rates of a much wider range of mammalian, avian and plantviruses that show similar suppression of CpG and UpA dinucleotidefrequencies.

Influence of Dinucleotide Frequencies on Reporter Gene Expression.

A variety of genes are used as reporters or selectable markers inbiotechnology, as components of expression vectors, transgenes andreplicons. Reporter genes or selectable markers are frequently derivedfrom prokaryotes (e.g. antibiotic resistance genes) or lower eukaryotes(e.g. luciferase, green fluorescent protein). Most derive from organismswithout reduced or absent host genome DNA methylation and consequentlylack the suppression of CpG dinucleotides observed in vertebratesequences and in RNA viruses infecting them. The inventor hypothesisedthat high CpG frequencies in commonly used reporter genes such asfirefly luciferase (derived from the insect Photinus pyralis) may have ageneric, harmful effect on gene expression and replicative ability ofreplicons containing them. The inventor has previously observedsubstantial enhancement in luciferase expression and replication of theE7 replicon though insertion of a zero-CpG, low UpA replacementluciferase sequence. The inventor has now observed the same phenomenonin the HCV replicon.

The Con1 replicon is widely used to study the replication of hepatitis Cvirus (Lohmann et al. 1999, Science 285: 110-1 13). A currently widelyused Conl-derived construct (Krieger et al. 2001 , J. Virol.75:4614-4624) contains a luciferase reporter gene similar to that usedin the E7 replicon and which shows similarly elevated CpG frequencies.The inventor replaced this with a CpG-zero, UpA-low synthetic sequenceand compared luciferase expression with the parental sequence.

This degree of replication enhancement of the HCV replicon exceeded thateven of E7. Remarkably, in its unmodified form, the Conl HCV repliconhas been used in replication assays in academic research and by thepharmaceutical industry for antiviral development for over 12-13 yearswithout any idea that its replication is fundamentally compromised byinserted reporter genes (see FIG. 10). This underlines the novelty ofthe discovery described in the instant disclosure.

Similar modifications can be made to a red fluorescent protein (RFP)expressing HCV replicon construct. In this specific case, commonly usedRFP sequences as transgenes and other vectors show CpG frequencies ofover 0.6 (observed to expected ratio) which potentially also influencetheir expression and mediate unintended cellular activation processes.

Not only does luciferase (and likely other high CpG reporter genes)reduce the replication of replicons (e.g. E7 and HCV) but theirintracellular expression has a likely substantial effect on thenon-physiological activation of cellular defence pathways (Atkinson etal. 2014, Nucleic acids research, gku075). These have potentiallycompromised studies of effects of innate immune responses to viralreplication in cells. Similar concerns about potential toxicity andcellular activation effects naturally arise when considering the use ofthese and other sequences with high CpG frequencies as selection orreporter genes in wider areas of biotechnology. The instability of manysequences used as transgenes may originate through recruitment of innateand inflammatory responses against cells expressing such reporter genesor selection markers.

CpG and UpA Removal to Enhance Virus Replication in the Manufacture ofInactivated Virus Vaccines.

By quantitative PCR and infectivity assays, accelerated replication ofCpG/UpA-low mutants in multistep replication assays has beendemonstrated, but to reinforce this it is useful to show further thatenhanced replication produces greater yields of viral proteins thatrepresent the protective component of a vaccine.

The inventor infected RD cells with wild type echovirus 7 and theCpG/UpA-low mutant. Cells were harvested at several time points afterinfection and expression of viral capsid protein extracted from cellsand supernatant quantified by Western blot using a specific anti-capsidmonoclonal antibody (FIG. 11).

The CpG/UpA-low echovirus 7 mutant showed enhanced capsid proteinexpression throughout the time course of the experiment, quantified atlevels of 2-fold higher than the WT control at 12 hours and increasingto 14.5-fold at 18 hours Translated to a poliovirus system, thisprovides the evidence required for the ability of this mutationalprocess to substantially improve inactivated virus vaccine productionyields.

The experimental results depicted in FIG. 11 were obtained from a mutantE7 with approximately 30% of the genome replaced by CpG/UpA-low mutatedsequences. Further enhancement of virus replication and viral proteinproduction can almost certainly be achieved through further replacementof sequences in other parts of the coding region of the genome. In E7and likely in poliovirus, one is typically able to replace up to 80% ofthe genome with CpG/UpA-low sequences and achieve further enhancement ofvirus replication. For influenza A virus it is expected that segments 1,4, 5 and 6 (collectively approximately 43% of the genome) can bereplaced with CpG/UpA-low sequences. Segments 4 and 6 encode thehaemagglutinin and neuraminidase proteins that represent the principalprotective components in the inactivated IAV vaccine.

Discussion High Mutants

The first part of this study demonstrated that specifically increasingthe frequency of CpG or UpA dinucleotides in E7 results in severe viralattenuation. Attenuation was characterised by a dramatic reduction inreplication rate, smaller plaque area, low particle to infectivity ratioand a low competitive fitness relative to WT E7. The results agree withthe outcome of previous studies in poliovirus, in which codonreplacement or de-optimisation leading to an increase in CpG/UpAfrequency correlated negatively with replicative fitness (Burns et al.2009, J. Virol. 83:9957-9969, Coleman et al. 2008, Science.320:1784-1787). A reduced RNA to infectivity ratio due to higher CpG andUpA frequencies was also observed in poliovirus (Burns et al. 2009, J.Virol. 83:9957-9969). Increasing CpG and UpA in E7 had a greater effectthan in poliovirus, where introducing 105 new CpG dinucleotides in thecapsid region led to approximately a 3-fold reduction in infectivityoutput (Burns et al. 2009, J. Virol. 83:9957-9969). In E7, introducing129 new CpGs in the capsid region led to a 74-fold reduction ininfectivity titre, whilst introducing 116 CpGs into the region ofnon-structural genes caused a 7500-fold reduction. Similar experimentsare currently underway using Theiler's murine encephalomyelitis virus(TMEV) and influenza A virus, in which increased CpG or UpA frequencyalso results in a decrease in viral replication (data not shown). Ourresults show definitively that experimental attenuation of viral fitnessis specifically related to CpG and UpA frequencies and is irrespectiveof %G+C content, also dispelling theories that fitness is determined bynon-preferred codon replacement itself or by codon pair bias (Coleman etal. 2008, Science. 320: 1784-1787, Burns et al. 2009, J. Virol.83:9957-9969). The permuted control used in this study negates thepossibility that attenuation is due to disruption in RNA secondarystructure. Furthermore, replication defects are unlikely to result froma decrease in translational efficiency, as previous studies have shownthat protein synthesis levels are unaltered even for highly attenuatedviruses (Burns et al. 2006, J. Virol. 80:3259-3272, Burns et al. 2009,J. Virol. 83:9957-9969).

Changes in CpG frequency had a greater effect on viral replication thanchanges in UpA levels, being both more beneficial to replication whenlowered, and more detrimental when raised. When competed directly, thedouble region CpG-low mutant showed clear selective advantage over itscounterpart UpA-low mutant. This could be attributed to the differencesbetween final CpG and UpA frequency in the modified regions; CpGs wereeliminated to a greater extent than UpAs in the low mutant, whilst morewere introduced in the high mutant. However, this seems unlikely toaccount for the difference in fitness. In poliovirus, CpG-high mutantsalso exhibited a more severe attenuation than UpA-high mutants (Burns etal. 2009, J. Virol. 83:9957-9969), and selection against CpGdinucleotides has been shown to be greater than against UpA duringserial passage of codon-deoptimised virus (Burns et al. 2006, J. Virol.80:3259-3272). The dissimilar patterns of CpG and UpA suppressionamongst organisms points to different selective pressures acting uponeach dinucleotide (Burns et al. 2009, J. Virol. 83:9957-9969). CpGfrequency is widely suppressed in higher eukaryotes and the smallviruses that infect them (Karlin et al. 1994, J Virol 68, 2889-2897,Burge et al. 1992, Proceedings of the National Academy of Sciences ofthe United States of America 89: 1358-1362), whilst UpA suppression isalmost universal. UpA-rich RNA is degraded in mammalian host cells bythe antiviral endonuclease RNase L, which cleaves UpU or UpAdinucleotides in ssRNA (Washenberger et al. 2007, Virus Res 130, 85-95.,Duan and Antezana. 2003, J Mol Evol 57, 694-701). Not being subject tomethylation, small RNA viruses may have evolved to mimic both the CpGand UpA dinucleotide composition of their hosts, but for differentevolutionary reasons (Burns et al. 2009, J. Virol. 83:9957-9969). Thedifference between CpG-suppressed mammalian genomes and non-suppressedlower eukaryote genomes may account for the results observed byNougairede and colleagues (Nougairede et al. 2013, PLoS Pathog 9,e1003172), who found that viruses with de-optimised codons had a higherrelative fitness in insect cells compared to mammalian cells. These datasupport the hypothesis that higher eukaryotes can identify non-self RNAby detecting higher CpG and UpA frequencies than are present in theirown RNA.

Low Mutants

Surprisingly, viral replication was enhanced by designing mutants withlower CpG and UpA frequencies than WT. Mutants in which CpGs wereeliminated entirely from two modified regions (representing 30% of thegenome) out-competed WT in serial passage, whilst a replicon with CpGsremoved from only 14% of the genome showed a 6-fold higher replicationrate than the WT. Similar results were obtained for UpA-low mutants,despite the fact that UpAs could not be completely eliminated from themodified regions. Mutants in which both CpG and UpA frequency wasminimised in both regions showed an even higher level of replicativefitness. These unprecedented findings, confirmed by several differentassays, reveal an entirely novel phenomenon that would not have beenpredicted based on the results obtained from the CpG- and UpA-highmutants. If the host mechanism for detecting CpG and UpA in foreign RNAis based on sensing dinucleotide frequencies higher than in its own RNA,there is no immediate reason why viruses with non-physiologicallylowered frequencies should do better than those with frequenciesidentical to the host. One explanation is that the system forrecognising and limiting replication of RNA with high CpG/UpA isoptimised at a sensitivity level that prioritises avoiding falsenegatives. Due to the importance not letting viral RNA go un-detected,occasionally RNA with a WT level of CpG/UpA could be targeted. In thissituation, RNA with low CpG/UpA would have an advantage. Whether theCpG/UpA-low mutants could maintain their replicative advantage in awhole organism system is unclear. The heightened replication ratesobserved in viruses with reduced CpG/UpA ratios could provideopportunities for vaccine production. Where the vaccine involved akilled virus, an improved replication rate in cell culture would allow ahigher production rate of a virus with identical antigenicity to theoriginal.

The various molecular biological and other associated techniques toperform the present invention are well known to the skilled person, andthere is a plethora of reference material available on the subject whichwould form part of their common general knowledge. While specifictechniques have been described in detail above, it is perfectly withinthe ability of the skilled person to modify or adapt the techniquesdescribed above to work within the scope of the present invention. Asuitable reference text in respect of the various techniques discussedin the present application is Green & Sambrook, Molecular Cloning: ALaboratory Manual (Fourth Edition), 2012, Cold Spring Harbor LaboratoryPress.

1-44. (canceled)
 45. A method of producing a synthetic RNA expressionvector, the method comprising: modifying at least one region of aprimary nucleotide sequence which reduces the frequency of at least oneof CpG and UpA dinucleotides in said at least one region, therebyproducing a modified primary nucleotide sequence; and producing asynthetic RNA expression vector comprising said modified primarynucleotide sequence which has a reduced frequency of at least one of CpGand UpA dinucleotides compared to a corresponding synthetic RNAexpression vector which comprises the primary nucleotide sequence butwithout the sequence modifications.
 46. The method of claim 45 whichcomprises a step of preparing a DNA polynucleotide which encodes asynthetic polynucleotide having a reduced CpG and/or UpA frequency. 47.The method of claim 46 which comprises the step of transcribing said DNApolynucleotide to form a synthetic RNA polynucleotide having a reducedCpG and/or UpA frequency. 48-49. (canceled)
 50. The method of claim 45,wherein the synthetic RNA expression vector is a recombinant RNA viralvector.
 51. The method of claim 50, wherein the synthetic RNA expressionvector is a recombinant virus genome, optionally wherein the syntheticRNA expression vector is a recombinant single stranded RNA (ssRNA) virusgenome.
 52. The method of claim 45, wherein the frequency of both CpGand UpA dinucleotides is reduced in the synthetic RNA expression vectorcomprising said modified sequence, as compared to a correspondingsynthetic RNA expression vector which comprises the primary nucleotidesequence but without the sequence modifications.
 53. The method of claim45, wherein the at least one region of said primary nucleotide sequencewhich is modified to reduce the frequency of at least one of CpG and UpAdinucleotides is of at least 30 nucleotides in length, optionally the atleast one region is of at least 100 nucleotides in length, optionallythe at least one region is of at least 200 nucleotides in length,optionally the at least one region is of at least 500 nucleotides inlength, optionally the at least one region is of at least 1000nucleotides in length.
 54. The method of claim 51, wherein the at leastone region of said primary nucleotide sequence which is modified toreduce the frequency of at least one of CpG and UpA dinucleotides is theentire recombinant virus genome.
 55. The method of claim 45, wherein thesynthetic RNA expression vector comprising said modified primarynucleotide sequence which has a reduced frequency of at least one of CpGand UpA dinucleotides compared to a corresponding synthetic RNAexpression vector without the modified primary nucleotide sequenceexhibits increased open reading frame (ORF) expression as compared tothe corresponding synthetic RNA expression vector which comprises theprimary nucleotide sequence but without the sequence modifications. 56.The method of claim 45, wherein the frequency of at least one of CpG andUpA dinucleotides in the at least one region of said primary nucleotidesequence which is modified is reduced by at least 50%, optionally by atleast 60%, optionally by at least 70%, optionally by at least 80%,optionally by at least 90%, optionally by at least 95%, optionally by100%, as compared to the corresponding primary nucleotide sequence butwithout the sequence modifications.
 57. The method of claim 45, whereinthe frequency of CpG dinucleotides and the frequency of UpAdinucleotides in the at least one region of said primary nucleotidesequence which is modified is reduced by at least 50%, optionally by atleast 60%, optionally by at least 70%, optionally by at least 80%,optionally by at least 90%, optionally by at least 95%, optionally by100%, as compared to the corresponding primary nucleotide sequence butwithout the sequence modifications.
 58. The method of claim 45, whereinthe frequency of CpG dinucleotides and/or the frequency of UpAdinucleotides in the at least one region of said primary nucleotidesequence which is modified is reduced via introduction of synonymoussubstitutions into coding regions of said primary nucleotide sequence,as compared to the corresponding primary nucleotide sequence but withoutthe sequence modifications.
 59. The method of claim 45, wherein thefrequency ratio of the at least one of CpG and UpA dinucleotides is 0.4or lower, optionally 0.3 or lower, optionally 0.2 or lower, optionally0.1 or lower in the synthetic RNA expression vector as a whole.
 60. Themethod of claim 45, wherein the frequency of CpG dinucleotides and/orthe frequency of UpA dinucleotides is reduced in the open reading frames(ORFs) and/or coding regions of the synthetic RNA expression vector, ascompared to the corresponding ORFs and/or coding regions of thesynthetic RNA expression vector but without the sequence modifications.61. The method of claim 45, wherein regions totaling at least 50% of thesynthetic RNA expression vector are modified, optionally wherein regionstotaling at least 60% of the synthetic RNA expression vector aremodified, optionally wherein regions totaling at least 70% of thesynthetic RNA expression vector are modified.
 62. The method of claim45, wherein the at least one region of said primary nucleotide sequencewhich is modified to reduce the frequency of at least one of CpG and UpAdinucleotides is a viral open reading frame (ORF), optionally a viralORF derived from a viral genome.
 63. The method of claim 45, wherein thesynthetic RNA expression vector comprises a RNA virus adapted forexpression in a suitable expression system for the production of a virusvaccine, optionally wherein the virus vaccine expresses heterologouspathogen antigens.
 64. The method of claim 45, wherein the synthetic RNAexpression vector is present in a viral replicon.
 65. The method ofclaim 45, wherein the synthetic RNA expression vector is present in aviral replicon, optionally wherein replication of the viral replicon ina mammalian cell is enhanced relative to a viral replicon containing acorresponding RNA expression vector but without the sequencemodifications.
 66. A method of producing a synthetic recombinant ssRNAvirus genome, the method comprising: modifying at least one region of aprimary ssRNA virus genome to reduce the frequency of CpG and UpAdinucleotides in said at least one region of the primary ssRNA virusgenome, thereby producing one or more modified regions of the primaryssRNA virus genome; and producing a synthetic recombinant ssRNA virusgenome comprising said one or more modified regions which have a reducedfrequency of CpG and UpA dinucleotides compared to a correspondingsynthetic ssRNA virus genome which comprises a primary ssRNA virusgenome that has not been modified to reduce the frequency of CpG and UpAdinucleotides.